Preprocessing for PPM: Compressing Utf-8 Encoded Natural Language Text

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Encoded Natural Language Text

In this paper, several new universal preprocessing techniques are described to improve Prediction by Partial Matching (PPM) compression of UTF-8 encoded natural language text. These methods essentially adjust the alphabet in some manner (for example, by expanding or reducing it) prior to the compression algorithm then being applied to the amended text. Firstly, a simple bigraphs (two-byte) subs...

متن کامل

PPMexe: PPM for Compressing Software

With the emergence of software delivery platforms such as Microsoft’s .NET, code compression has become one of the core enabling technologies strongly affecting system performance. In this paper, we present PPMexe a set of compression mechanisms for executables that explores their syntax and semantics to achieve superior compression rates. The fundament of PPMexe is the generic paradigm of pred...

متن کامل

Natural Language Compression on Edge-Guided text preprocessing

This paper presents Edge-Guided (E-G), an optimized text preprocessing technique for compression purposes. It transforms the original text into a word net, which stores all relationships between adjoining words. A specific directed graph is proposed to model this transformation: words are stored in vertices, whereas edges represent word transitions. Thus, the word net has a text representation ...

متن کامل

Semantic Information Preprocessing for Natural Language Interfaces to Databases

An approach is described for supplying se-lectional restrictions to parsers in natural language interfaces (NLIs) to databases by extracting the selectional restrictions from semantic descriptions of those NLIs. Automating the process of finding selectional restrictions reduces NLI development time and may avoid errors introduced by hand-coding selectional restrictions.

متن کامل

Discourse Strategies for Generating Natural-Language Text

If a generation system is to produce text in response to a given communicative goal, it must be able to determine what to include in its text and how to organize this information so that it can be easily understood. In this paper, a computational model of discourse strategies is presented that can be used to guide the generation process in its decisions about what to say next. The model is base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Science and Information Technology

سال: 2015

ISSN: 0975-4660,0975-3826

DOI: 10.5121/ijcsit.2015.7204